home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Celestin Apprentice 5
/
Apprentice-Release5.iso
/
Source Code
/
Libraries
/
Sherlock 2.0
/
Mac v2.0 docs
/
Text Docs
/
Mac DevLIb .txt
< prev
next >
Wrap
Text File
|
1996-04-05
|
20KB
|
504 lines
The DevLib Development Library
This Chapter describes the various files in the folder DevLibSrc. The development library,
DevLib, provides a set of common functions used by a C compiler, assembler, linker and
debugger, as well as SPP, SDEL and SDIF. These routines are not specific to the Macintosh: they
may be of interest to C programmers using a wide variety of environments.
DevLib defines a set of classes. Each class is represented by a .h file and one or two .c files. The
.c files represent the “base” methods of each class. When a member (method or data) of a class
must be over-ridden by an application, that member is not defined in DevLibSrc. Rather, it is
defined in an application-specific, say XXyy.c file. That is, XX stands for the name of your
application, while yy stands for the particular class.
The externally visible public members (functions and data) of any class are exactly those functions
and data declared in the .h file for the class. In particular, each .h file contains a function prototype
for each public method.
The library consists of the following classes and auxiliary headers. Classes are identified by the
name of its .c file. The .c file typically contains more complete documentation than the
corresponding .h file.
The LIBmem class is particularly interesting. It describes an optimal method of allocating memory
based on so-called lifetime classes. Also, the LIBobj class describes replacements for malloc and
free. These classes complement each other; lifetime classes ultimately call malloc to get blocks of
memory.
LIBcmnd.c
This class parses a disk parameter file into a set of argv arguments. I use parameter files
containing hundreds of Sherlock arguments. Not only do the arguments control the program, they
provide a record of what I am working on.
The init_args function is most important routine of this class. init_args reads a text file, parses it
into separate arguments, allocates an argv vector, fills in the argv vector with pointers to the
arguments, and computes the value of argc.
LIBcvt.c
This class converts various kinds of variables into strings; it is a type-safe replacement for sprintf.
This class guards against overwriting the sprintf buffer. Most routines in this class require the
caller to supply a buffer into which the resulting string is put. The routines checks that the string
does not overflow the buffer. These checks are not foolproof because they rely on the stated size
of the buffer passed by the caller
This class is used mainly by the stream output routines in LIBes.c. Once in a while, however,
these routines are useful in their own right.
The compile-time constant CVT_USE_SPRINTF allows this file to avoid all calls to sprintf. This
was useful when bootstrapping a C compiler.
LIBend.c and XXend.c
This class handles end-of-program processing. Each application must define its own
end_close_all method, in an application-specific file, XXend.c.
LIBenv.c
This class reports on various system-dependent quantities, such as heap statistics that may be kept
by the operating system itself. At present, the code reports on the status of Macintosh heap zones.
This class is not defined for Power Macs.
The files LIBes.c and XXes.c
This class contains many stream output routines that are type safe replacements for fprintf. SPP,
SDEL and SDIF contain many examples of how to use these routines.
All output from these routines eventually passes through the es routine, which is not defined in
LIBes. Instead, each application is supposed to “override” es by providing its own version of es,
in an application-specific file, XXes.c. On the Macintosh es calls log_sout in LIBlog.c; on other
machines es could call fprintf(stderr, “%s”, s);
Each application must also provide definitions for the following routines: ecnl, ecnls, ecblanks,
es_assert_failed and es_internal_err. See the file SPPes.c for an example of the per-application
definitions of these routines.
The ecnl routine deserves special mention. It outputs a conditional newline. That is, it outputs a
newline only if the last character output was not a newline. Conditional newlines make it easy to
generate correct spacing without requiring strict coding conventions.
Short function names are very important here. In particular, es, enl, ecnl and eint must have short
names to be convenient to use. In effect, these names are all variants of the C++ << operator.
LIBio.c
This class provides slightly enhanced open, close, read and write routines. This file was more
useful before the ANSI Standard defined the library rigorously, but it is still sometimes useful to
isolate system dependencies.
LIBlib.c
This file is not a class. Rather, it specifies application-specific information to the library. Make a
separate copy of this file with a separate name, say XXlib.c, for each application. Do not use
LIBlib.c. It is only a template.
This is a simple, yet crucial file. Using variables to contain constant information is an Aha that
makes it possible for different programs to share the same source code. In effect, the linker injects
these constants into the source code of the library.
LIBlist.h
This class consists only of macros; there is no corresponding .c file. This file contains type-safe
macros for list processing. Using functions instead of macros would require type-defeating casts
to void *. Macros can avoid such casts. Think of these macros as a kind of C++ template.
Lifetime memory allocation reduces the need for these macros. (See LIBmem.h for a discussion
of lifetime allocation.) Elements of lists almost always have the same lifetime, so it is seldom
necessary to free list elements explicitly. With lifetime allocation, the list head is freed at the same
time all the nodes are freed. Most list processing simply disappears!
LIBlog.c
This class handles output to a log file and window. Usually all output from es is directed here.
Distinguishing this class from the class that prints the error stream simplified the code greatly.
By convention, Sherlock opens a log file when it sees a Sherlock argument of the form
++>>file_name.
Only one log file can be open at a time.
LIBmem.c, LIBmem.h, XXmem.c and XXmem.h
This class implement what I call the lifetime method of storage allocation. I have used this method
of storage allocation in an ANSI C compiler that I recently completed. It is not used in SPP, SDIF
or SDEL. I describe it here because it is an important part of DevLib.
Defining lifetimes
Let us define the lifetime of a heap object to be the time (point in the code) at which the object is
freed. This definition is deceptively simple. Note the following:
• The lifetime of an object does not depend on when the object is created, only when it is freed.
• The type of an object does not influence its lifetime. Objects of different types and sizes may
have the same lifetime.
• The lifetime of an object is determined when the object is created. It does not change.
Implementing lifetimes
Arbitrarily many lifetimes may exist in a program. A global variable points at the head of the list
of lifetimes. Position within this list has no significance. A dynamically allocated node describes
each lifetime. Each node contains the following information:
• The name of the lifetime. This name is used to create column headers in dumps of statistics.
• A pointer to the head of a list of blocks comprising the memory used by the lifetime. The first
block on the list is the current block. Memory is always allocated from the current block. A new
block is added to the list whenever the current block does not contain enough free memory to
handle a request for more memory.
• The number of free bytes in the current block and a pointer to the first free byte.
• A pointer to the head of a list of per-type statistics nodes for this lifetime. Each statistic node
describes the statistics for objects of the same type in this particular lifetime. Statistics nodes
contain statistics and a type name used in dumps.
• A pointer to the next lifetime on the global list of all lifetimes.
Allocating memory
I allocate memory use a variety of macros. The file LIBmem.h contains the basic macros. The
new_macro allocates enough bytes to hold an object whose pointer is given. For example:
my_type * p;
new_macro(p, perm_life, perm_type_stat);
perm_life is a pointer to a lifetime node and perm_type_stat a pointer to a statistics node attached to
perm_life.
For safety, I would define an abbreviation macro defined in an application-defined header, say
XXmem.h. A typical abbreviation macro would be:
#define new_perm_my_type_macro(p) \
new_macro(p, perm_life, perm_my_type_stat)
This macro ensures that the proper statistic node is used with the proper lifetime. In other words,
abbreviation macros represent associations, in this example the association of perm_my_type_stats
with perm_life.
The new_macro is defined as follows in LIBmem.h:
#define new_macro(p, life, stat) \
new_size_macro(p, sizeof(*p), life, stat);
new_macro computes the number of bytes to be allocated using sizeof(*p). This will work
regardless of what p points to. The compiler will complain if p is not, in fact, a pointer. This
macro is type safe and bullet-proof.
All the real work is done by the new_size macro. The actual definition of this macro is a bit
complicated: see LIBmem.h. A simplified form of the macro is as follows:
#define new_size_macro(p, size, life, stat) \
// update statistics for stat \
if (size > life -> mem_avail) { \
p = mem_new_block(size_, life); \
} \
else { \
p = life -> mem_ptr; \
life -> mem_ptr = life -> mem_ptr + size; \
life -> mem_avail -= size_; \
} \
}
Usually, the current block will contain more than size bytes. In that case, we execute the
following instructions:
p = life -> mem_ptr;
life -> mem_ptr = life -> mem_ptr + size;
life -> mem_avail -= size;
Clearly, it's not possible to allocate memory more quickly than these 3 instructions..
When the current block does not contain enough free memory for the requested item, the macro
call mem_new_block to get a new block of memory. mem_new_block is defined in LIBmem.c.
It sets life -> mem_ptr and life -> mem_avail to describe the newly allocated block.
mem_new_block will usually allocate a standard-sized block (like 1024 bytes), but can allocate an
arbitrarily large block if the requested number of bytes would not fit in a standard-sized block.
new_size_macro is the fastest possible way to allocate memory. Indeed, it doesn't matter how fast
mem_new_block is! The reason is simple: the current block will usually be big enough to contain
the next item, assuming that most allocated items are small compared to the standard block size.
We can arbitrarily decrease the cost of calling mem_new_block by increasing the size of the
standard block.
So much for allocating memory. Deallocating memory is even more efficient. As a consequence
of the definition of a lifetime, all items with the same lifetime are deallocated at once. Deallocating
a lifetime is trivial: we simply free all of the blocks of the lifetime.
LIBobj.c and LIBobj.h
This class provides more robust versions of calloc and free. This class has a long history: the
present version if much easier to use and more powerful than previous versions. This class is
probably more powerful than some of the malloc replacements being sold today.
Access this class using the obj_new_macro or obj_free_macro, defined in LIBobj.h. A third
macro, obj_new_var_tag_macro is much less often used, though it is sometimes essential.
The compile-time constant PRODUCTION controls these macros. When PRODUCTION is
#defined, the macros expand essentially to calloc and free. Otherwise, the macros expand to
routines that provide memory protection and statistics gathering capabilities.
These routines use a string as a type name without sacrificing efficiency. I call this the Sherlock
trick. How does this work? Let’s look at the definition of the debugging version of
obj_new_macro:
#define obj_new_macro(the_mem, size, tag) { \
static type_des * tp_ = NULL; \
the_mem = obj_new_db(size, tag, &tp_); \
}
The first time a particular instance of this macro is called, the static variable tp_ is NULL, so
obj_new_db searches a hashed list of all types it knows about for the tag. If it is found, it sets tp_
to point to the type_des describing the type. Otherwise it creates a type_des describing the new
type and points tp_ to it. The second time a particular instance of this macro is called, tp_ has
already been set, and obj_new_db can access the type_des directly without any searching at all.
Notice: it is never necessary to initialize a type_des; obj_new_macro does it automatically. The
tag string is enough to initialize the new type. In effect, the tag is a type. So to create a new type,
just call obj_new_macro with a new string argument. The list of tag names is searched only once
per instance of the macro, so the time overhead is not noticeable.
Usually the tag argument is a normal, statically allocated string. The Sherlock trick won’t work if
the tag argument is a variable because tp_ must then be recalculated on each call. In those rare
cases, use obj_new_var_tag_macro. This macro essentially sets tp_ every time it is called.
This class can produce a formatted report of statistics for all objects, similar to the report provided
by the LIBmem class. LIBmem allocates memory in blocks, so only those blocks will show up in
the report created by LIBobj.
When PRODUCTION is not defined, all memory is preceded and followed by fill bytes that are
filled with a known pattern. Most memory overwrites can easily be found by checking these fill
bytes.
When the ++obj_v Sherlock trace is enabled all fill bytes are checked any time a new object is
allocated or freed. Obviously, this takes a lot of time, and it is very useful in finding memory
problems.
When the ++obj_watch Sherlock trace is enabled all fill bytes are put on a Sherlock watch list.
(I’ll discuss watch lists below). This watch list is checked whenever a Sherlock macro is
executed, so all fill bytes are checked whenever a Sherlock macro is executed. This option takes
even more time than the ++obj_v option, but finds memory problems quickly.
Regardless of the settings of ++obj_watch and ++obj_v, the fill bytes of an object are checked
when it is freed.
LIBos.c
This class provides an output stream. There are two kinds of routines in this class. The out_bytes
routine provides non-text output. All other routines provide text output. All text output is
funneled through the os routine, so redirecting output is feasible and simple.
Sherlock (sl_xxx.c and sl.h)
Sherlock is a way of enabling and disabling code at run time. Sherlock consists of a large set of
macros such as TRACE. TRACE(a,b) executes the code b if the string a has been enabled from
the command line. For example,
TRACE("obj_v", obj_check_all());
This macro executes obj_check_all if ++obj_v has been selected on the command line. If the
compile-time constant called SHERLOCK is not #defined, all Sherlock macros expand to do-
nothing code.
The second parameter to the TRACE macro can be any sequence of C source statements. The
following is perfectly valid:
TRACE("xyz",
{
int i;
for(i = 0; i < limit; i++)
fprintf(stderr, "array[%d]=%d\n", i, array[i]);
}
);
In particular, the commas in the call to fprintf do not delimit macro parameters because they are
inside parentheses. About the only thing you can't put inside a Sherlock macro is an initializer
containing commas.
The Sherlock trick is important here. Without it, we would have to declare and initialize a variable
for every different tag, xyz in the example above. This would be intolerable. The Sherlock trick
allows us to dispense with tracing variables! My programs have hundreds of tags that can be
enabled or disabled independently. None of these tags require any initialization on my part. The
only initialization is a single call to parse the Sherlock arguments and remove them from the argv
list so they don’t interfere with the program’s usual arguments.
My preferred style of using Sherlock has changed a bit since Sherlock was first released:
• I no longer use the RETURN_xxx family of macros. These turned out to be too clumsy. A
typical contains a single exit point, immediately preceded by a STATX or TRACEPX macro. I
prefer to use goto done to indicate a return from a function. This complicates some functions, but
not much.
• I often use STATB as the first Sherlock macro. That way, the timing statistics of the routines are
not influenced by the tracing that would be produced by TRACEPB.
Watch lists are new. The SL_WATCH macro places a node describing an area of memory and its
contents on a list. This list is checked whenever a Sherlock macro is executed. I place Sherlock
macros at the start and end of almost all my functions, so this watch list is examined at the start
and end of each function. We can catch memory being trashed within the function that trashed it,
yet the overhead of examining the watch list is much less than if it were examined after every
instruction. By the way, only a few dozen lines of code in sl_macro.c are needed to implement
watch lists.
LIBtypes.h
This file contains the typedefs for types that are shared among the files of DevLib. The actual
definitions of the structs need not appear in this header file! This has the following advantages:
• It allows struct A in a header file H to refer to struct B even though the definition of struct B does
not appear in H.
• Include files can be included in any order because all necessary typedefs are in LIBtypes.h.
• Typedefs change rarely, and LIBtypes.h needs change only if a new global type is created.
• Include files can correspond closely to C++ classes. That is, the declarations of derived classes
need not be contained in the header that contains the declaration of the base class. Header files can
describe a single class, whether or not that class is derived from another class.
There are a few drawbacks to using LIBtypes.h, none of which are significant for small and
medium sized programs.
• The typedefs in LIBtypes.h are global, even though they may be used only in one or two files.
• All library files depend on LIBtypes.h, so they are a bit less self contained.
Possible Improvements
I keep finding new uses for the Sherlock trick. One possibility would be to create stat nodes
automatically by defining the macro MEM_UPDATE_STATS in LIBmem.h as follows:
{ \
static mem_stat_node * h = NULL; \
if (h == NULL) h = mem_init_stats(tag); \
// update h as usual \
}
This would eliminate the need to define and initialize statistics variable. In particular, you
wouldn’t need to change the application-specific version of LIBmem.h to create a new kind of
node. Very convenient.
This is a bit slower than the old scheme. Ignoring (as we should) the time to execute
mem_init_stats once, the test (h== NULL) almost doubles the time to execute the macro.
However, statistics are not typically kept in production code, so the speed penalty is usually
negligible.